Hilberg’s Conjecture — a Challenge for Machine Learning
نویسنده
چکیده
We review three mathematical developments linked with Hilberg’s conjecture—a hypothesis about the power-law growth of entropy of texts in natural language, which sets up a challenge for machine learning. First, considerations concerning maximal repetition indicate that universal codes such as the Lempel-Ziv code may fail to efficiently compress sources that satisfy Hilberg’s conjecture. Second, Hilberg’s conjecture implies the empirically observed power-law growth of vocabulary in texts. Third, Hilberg’s conjecture can be explained by a hypothesis that texts describe consistently an infinite random object.
منابع مشابه
Empirical Evidence for Hilberg’s Conjecture in Single-Author Texts
Hilberg’s conjecture is a statement that the mutual information between two adjacent blocks of text in natural language scales as n , where n is the block length. Previously, this hypothesis has been linked to Herdan’s law on the levels of word frequency and of text semantics. Thus it is worth a direct empirical test. In the present paper, Hilberg’s conjecture is tested for a selection of Engli...
متن کاملHilberg’s Conjecture: an Updated FAQ
This note is a brief introduction to theoretical and experimental results concerning Hilberg’s conjecture, a hypothesis about natural language. The aim of the text is to provide a short guide to the literature. 1 What is Hilberg’s conjecture? In the early days of information theory, Shannon (1951) published estimates of conditional entropy for printed English. A few decades later, Hilberg (1990...
متن کاملA Hybrid Machine Learning Method for Intrusion Detection
Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...
متن کاملA Preadapted Universal Switch Distribution for Testing Hilberg's Conjecture
Hilberg’s conjecture states that the mutual information between two adjacent long blocks of text in natural language grows like a power of the block length. The exponent in this hypothesis can be upper bounded using the pointwise mutual information computed for a carefully chosen code. The bound is the better, the lower the compression rate is but there is a requirement that the code be univers...
متن کاملA Preadapted Universal Switch Distribution for Testing Hilberg's Conjecture
Hilberg’s conjecture states that the mutual information between two adjacent long blocks of text in natural language grows like a power of the block length. The exponent in this hypothesis can be upper bounded using the pointwise mutual information computed for a carefully chosen code. The bound is the better, the lower the compression rate is but there is a requirement that the code be univers...
متن کامل